AITopics | dynamical central limit theorem

Collaborating Authors

dynamical central limit theorem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Dynamical Central Limit Theorem for Shallow Neural Networks

Neural Information Processing SystemsDec-24-2025, 22:27:51 GMT

Recent theoretical work has characterized the dynamics and convergence properties for wide shallow neural networks trained via gradient descent; the asymptotic regime in which the number of parameters tends towards infinity has been dubbed the mean-field limit. At initialization, the randomly sampled parameters lead to a deviation from the mean-field limit that is dictated by the classical central limit theorem (CLT). However, the dynamics of training introduces correlations among the parameters raising the question of how the fluctuations evolve during training. Here, we analyze the mean-field dynamics as a Wasserstein gradient flow and prove that the deviations from the mean-field evolution scaled by the width, in the width-asymptotic limit, remain bounded throughout training. This observation has implications for both the approximation rate and the generalization: the upper bound we obtain is controlled by a Monte-Carlo type resampling error, which importantly does not depend on dimension. We also relate the bound on the fluctuations to the total variation norm of the measure to which the dynamics converges, which in turn controls the generalization error.

dynamical central limit theorem, name change, shallow neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Review for NeurIPS paper: A Dynamical Central Limit Theorem for Shallow Neural Networks

Neural Information Processing SystemsFeb-8-2025, 15:06:13 GMT

Weaknesses: Proposition 2.1 is tangential and not new in content or proof technique; much the same was shown in, eg, [Mei, Misiakiewicz, Montanari '19] and other works building upon it. The proofs of Propositions 3.1 and 3.2, the most meaningful results, are simple calculations making use of the Mean Value Theorem and Duhamel's principle, respectively. Theorem 3.3 is a lot of work for what is not a particularly interesting result: it is asymptotic in both n and t, so yields no insight into dynamics, or any relationship between n and t. Moreover it is not truly dimension-free as the authors claim; dimension implicitly shows up in the moments of f, given that \psi is positively homogeneous, so it is rather the case that dimension shows up how one might expect in a variance bound. Moreover (and this is made clearer upon examination of experimental results), it is not useful to reason about optimization time (finite or asymptotic) without reference to a discretization scheme. The experiments make reference to "epochs", but there is no optimization algorithm to relate to the flows discussed in the theory.

dynamical central limit theorem, neurips paper, shallow neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Review for NeurIPS paper: A Dynamical Central Limit Theorem for Shallow Neural Networks

Neural Information Processing SystemsFeb-8-2025, 15:06:06 GMT

The paper provides CLT-like results for the dynamics of single-hidden layer, wide neural networks in the mean-field limit. The authors also show that under certain conditions the long-time fluctuations can be controlled with an MC type resampling error. The reviewers had a positive assessment of the finite width analysis and the strength of some of the technical contributions. They did however raise a variety of concerns regarding the asymptotic nature of results (both in n and t), assumptions on Dhat, and lack of results with discretization. While some of these concerns were alleviated based on the authors' response, the more critical reviewers maintained their score and one positive reviewer slightly decreased theirs from 8 to 7. I agree with the reviewers that CLT type results for finite width is indeed interesting.

dynamical central limit theorem, reviewer, shallow neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)

Add feedback

A Dynamical Central Limit Theorem for Shallow Neural Networks

Neural Information Processing SystemsJan-16-2025, 18:52:16 GMT

Recent theoretical work has characterized the dynamics and convergence properties for wide shallow neural networks trained via gradient descent; the asymptotic regime in which the number of parameters tends towards infinity has been dubbed the "mean-field" limit. At initialization, the randomly sampled parameters lead to a deviation from the mean-field limit that is dictated by the classical central limit theorem (CLT). However, the dynamics of training introduces correlations among the parameters raising the question of how the fluctuations evolve during training. Here, we analyze the mean-field dynamics as a Wasserstein gradient flow and prove that the deviations from the mean-field evolution scaled by the width, in the width-asymptotic limit, remain bounded throughout training. This observation has implications for both the approximation rate and the generalization: the upper bound we obtain is controlled by a Monte-Carlo type resampling error, which importantly does not depend on dimension.

deviation, dynamical central limit theorem, shallow neural network

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback